43 research outputs found
Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System
We consider the model of a token-based joint auto-scaling and load balancing
strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van
Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient
scalable implementation and yet achieves asymptotically optimal steady-state
delay performance and energy consumption as the number of servers .
In the above work, the asymptotic results are obtained under the assumption
that the queues have fixed-size finite buffers, and therefore the fundamental
question of stability of the proposed scheme with infinite buffers was left
open. In this paper, we address this fundamental stability question. The system
stability under the usual subcritical load assumption is not automatic.
Moreover, the stability may not even hold for all . The key challenge stems
from the fact that the process lacks monotonicity, which has been the powerful
primary tool for establishing stability in load balancing models. We develop a
novel method to prove that the subcritically loaded system is stable for large
enough , and establish convergence of steady-state distributions to the
optimal one, as . The method goes beyond the state of the art
techniques -- it uses an induction-based idea and a "weak monotonicity"
property of the model; this technique is of independent interest and may have
broader applicability.Comment: 30 page
Scalable Load Balancing Algorithms in Networked Systems
A fundamental challenge in large-scale networked systems viz., data centers
and cloud networks is to distribute tasks to a pool of servers, using minimal
instantaneous state information, while providing excellent delay performance.
In this thesis we design and analyze load balancing algorithms that aim to
achieve a highly efficient distribution of tasks, optimize server utilization,
and minimize communication overhead.Comment: Ph.D. thesi
Supermarket Model on Graphs
We consider a variation of the supermarket model in which the servers can
communicate with their neighbors and where the neighborhood relationships are
described in terms of a suitable graph. Tasks with unit-exponential service
time distributions arrive at each vertex as independent Poisson processes with
rate , and each task is irrevocably assigned to the shortest queue
among the one it first appears and its randomly selected neighbors. This
model has been extensively studied when the underlying graph is a clique in
which case it reduces to the well known power-of- scheme. In particular,
results of Mitzenmacher (1996) and Vvedenskaya et al. (1996) show that as the
size of the clique gets large, the occupancy process associated with the
queue-lengths at the various servers converges to a deterministic limit
described by an infinite system of ordinary differential equations (ODE). In
this work, we consider settings where the underlying graph need not be a clique
and is allowed to be suitably sparse. We show that if the minimum degree
approaches infinity (however slowly) as the number of servers approaches
infinity, and the ratio between the maximum degree and the minimum degree in
each connected component approaches 1 uniformly, the occupancy process
converges to the same system of ODE as the classical supermarket model. In
particular, the asymptotic behavior of the occupancy process is insensitive to
the precise network topology. We also study the case where the graph sequence
is random, with the -th graph given as an Erd\H{o}s-R\'enyi random graph on
vertices with average degree . Annealed convergence of the occupancy
process to the same deterministic limit is established under the condition
, and under a stronger condition ,
convergence (in probability) is shown for almost every realization of the
random graph.Comment: 32 page
Phase transitions of extremal cuts for the configuration model
The -section width and the Max-Cut for the configuration model are shown
to exhibit phase transitions according to the values of certain parameters of
the asymptotic degree distribution. These transitions mirror those observed on
Erd\H{o}s-R\'enyi random graphs, established by Luczak and McDiarmid (2001),
and Coppersmith et al. (2004), respectively
Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems
We present an analysis of large-scale load balancing systems, where the
processing time distribution of tasks depends on both the task and server
types. Our study focuses on the asymptotic regime, where the number of servers
and task types tend to infinity in proportion. In heterogeneous environments,
commonly used load balancing policies such as Join Fastest Idle Queue and Join
Fastest Shortest Queue exhibit poor performance and even shrink the stability
region. Interestingly, prior to this work, finding a scalable policy with a
provable performance guarantee in this setup remained an open question.
To address this gap, we propose and analyze two asymptotically delay-optimal
dynamic load balancing policies. The first policy efficiently reserves the
processing capacity of each server for ``good" tasks and routes tasks using the
vanilla Join Idle Queue policy. The second policy, called the speed-priority
policy, significantly increases the likelihood of assigning tasks to the
respective ``good" servers capable of processing them at high speeds. By
leveraging a framework inspired by the graphon literature and employing the
mean-field method and stochastic coupling arguments, we demonstrate that both
policies achieve asymptotic zero queuing. Specifically, as the system scales,
the probability of a typical task being assigned to an idle server approaches
1
Large deviations analysis for the queue in the Halfin-Whitt regime
We consider the FCFS queue in the Halfin-Whitt heavy traffic
regime. It is known that the normalized sequence of steady-state queue length
distributions is tight and converges weakly to a limiting random variable W.
However, those works only describe W implicitly as the invariant measure of a
complicated diffusion. Although it was proven by Gamarnik and Stolyar that the
tail of W is sub-Gaussian, the actual value of was left open. In subsequent work, Dai and He
conjectured an explicit form for this exponent, which was insensitive to the
higher moments of the service distribution.
We explicitly compute the true large deviations exponent for W when the
abandonment rate is less than the minimum service rate, the first such result
for non-Markovian queues with abandonments. Interestingly, our results resolve
the conjecture of Dai and He in the negative. Our main approach is to extend
the stochastic comparison framework of Gamarnik and Goldberg to the setting of
abandonments, requiring several novel and non-trivial contributions. Our
approach sheds light on several novel ways to think about multi-server queues
with abandonments in the Halfin-Whitt regime, which should hold in considerable
generality and provide new tools for analyzing these systems
Join-the-Shortest Queue Diffusion Limit in Halfin-Whitt Regime: Tail Asymptotics and Scaling of Extrema
Consider a system of parallel single-server queues with unit-exponential
service time distribution and a single dispatcher where tasks arrive as a
Poisson process of rate . When a task arrives, the dispatcher
assigns it to one of the servers according to the Join-the-Shortest Queue (JSQ)
policy. Eschenfeldt and Gamarnik (2015) established that in the Halfin-Whitt
regime where as , appropriately
scaled occupancy measure of the system under the JSQ policy converges weakly on
any finite time interval to a certain diffusion process as .
Recently, it was further established by Braverman (2018) that the stationary
occupancy measure of the system converges weakly to the steady state of the
diffusion process as .
In this paper we perform a detailed analysis of the steady state of the above
diffusion process. Specifically, we establish precise tail-asymptotics of the
stationary distribution and scaling of extrema of the process on large
time-interval. Our results imply that the asymptotic steady-state scaled number
of servers with queue length two or larger exhibits an Exponential tail,
whereas that for the number of idle servers turns out to be Gaussian. From the
methodological point of view, the diffusion process under consideration goes
beyond the state-of-the-art techniques in the study of the steady-state of
diffusion processes. Lack of any closed form expression for the steady state
and intricate interdependency of the process dynamics on its local times make
the analysis significantly challenging. We develop a technique involving the
theory of regenerative processes that provides a tractable form for the
stationary measure, and in conjunction with several sharp hitting time
estimates, acts as a key vehicle in establishing the results.Comment: 41 pages; To appear in the Annals of Applied Probabilit
Distributed Rate Scaling in Large-Scale Service Systems
We consider a large-scale parallel-server system, where each server
independently adjusts its processing speed in a decentralized manner. The
objective is to minimize the overall cost, which comprises the average cost of
maintaining the servers' processing speeds and a non-decreasing function of the
tasks' sojourn times. The problem is compounded by the lack of knowledge of the
task arrival rate and the absence of a centralized control or communication
among the servers. We draw on ideas from stochastic approximation and present a
novel rate scaling algorithm that ensures convergence of all server processing
speeds to the globally asymptotically optimum value as the system size
increases. Apart from the algorithm design, a key contribution of our approach
lies in demonstrating how concepts from the stochastic approximation literature
can be leveraged to effectively tackle learning problems in large-scale,
distributed systems. En route, we also analyze the performance of a fully
heterogeneous parallel-server system, where each server has a distinct
processing speed, which might be of independent interest.Comment: 32 pages, 4 figure
Asymptotically Optimal Load Balancing Topologies
We consider a system of servers inter-connected by some underlying graph
topology . Tasks arrive at the various servers as independent Poisson
processes of rate . Each incoming task is irrevocably assigned to
whichever server has the smallest number of tasks among the one where it
appears and its neighbors in . Tasks have unit-mean exponential service
times and leave the system upon service completion.
The above model has been extensively investigated in the case is a
clique. Since the servers are exchangeable in that case, the queue length
process is quite tractable, and it has been proved that for any ,
the fraction of servers with two or more tasks vanishes in the limit as . For an arbitrary graph , the lack of exchangeability severely
complicates the analysis, and the queue length process tends to be worse than
for a clique. Accordingly, a graph is said to be -optimal or
-optimal when the occupancy process on is equivalent to that on
a clique on an -scale or -scale, respectively.
We prove that if is an Erd\H{o}s-R\'enyi random graph with average
degree , then it is with high probability -optimal and
-optimal if and as , respectively. This demonstrates that optimality can
be maintained at -scale and -scale while reducing the number of
connections by nearly a factor and compared to a
clique, provided the topology is suitably random. It is further shown that if
contains bounded-degree nodes, then it cannot be -optimal.
In addition, we establish that an arbitrary graph is -optimal when its
minimum degree is , and may not be -optimal even when its minimum
degree is for any .Comment: A few relevant results from arXiv:1612.00723 are included for
convenienc